Parallel Fourier Transformations using shared memory nodes

نویسنده

  • Solon Pissis
چکیده

The Fast Fourier Transform (FFT) is of great importance for various scientific applications used in High Performance Computing (HPC). However, a detailed performance analysis shows that the FFT routines used in these applications, prevent them from scaling to large processor counts. The All-to-All type communication required inside these transformation routines, which becomes extremely costly when large processor counts are involved, seems to be the limiting factor. In the scope of this dissertation, we mainly focus on whether and how the performance of the parallel two-dimensional (2D) FFT can be improved, by exploiting the access to the shared memory nodes of HPCx, a cluster of POWER 5 SMP nodes. In particular, we investigate how to efficiently transfer the data between the processing elements involved in the parallel 2D FFT. Different OpenMP strategies are proposed for the parallelisation of the 2D FFT. The results demonstrate that, for certain problem sizes between 16 and 8192, the access to the shared memory of an HPCx node (16 processors) can produce gains in performance compare to the MPI implementation. In addition, for large processors counts, we use our results from the 2D case to optimise the parallelisation of the three-dimensional (3D) FFT with the Hybrid, a mixed mode programming model between shared memory programming and messaging passing. In our implementation, we use the Master-only style, a version of the Hybrid model, where the MPI communication is handled only by the master thread, outside the OpenMP parallel regions. The results demonstrate a good scaling of the code for problem sizes between 64 and 512 up to 1024 processors. The performance comparisons illustrate that, in certain cases, the Hybrid model can prove beneficial compare to the 2D data decomposition with pure MPI. Subject area: High Performance Computing

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multigrain Shared Memory Multigrain Shared Memory

Parallel workstations, each comprising a 10-100 processor shared memory machine, promise cost-e ective general-purpose multiprocessing. This thesis explores the coupling of such smallto medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. Multiprocessors built in this fashion are called Distributed Scalable Shared memo...

متن کامل

"Slow Is Fast" for Wireless Sensor Networks in the Presence of Message Losses

Transformations from shared memory model to wireless sensor networks (WSNs) quickly become inefficient in the presence of prevalent message losses in WSNs, and this prohibits their wider adoption. To address this problem, we propose a variation of the shared memory model, the SF shared memory model, where the actions of each node are partitioned into slow actions and fast actions. The tradition...

متن کامل

Shaman: A Distributed Simulator for Shared Memory Multiprocessors

This paper describes our distributed architectural simulator of shared memory multiprocessors named Shaman. The simulator runs on a PC cluster that consists of multiple front-end nodes to simulate the instruction level behavior of a target multiprocessor in parallel and a back-end node to simulate the target memory system. The front-end also simulates the logical behavior of the shared memory u...

متن کامل

Compiling MPI for Many-Core Systems

Processors with multiple (or many) cores and shared memory are becoming ubiquitous across the computing spectrum. MPI, the current de facto programming model for scalable parallel applications, enforces copies between source and target processes and thus can not fully utilize shared memory and cache architectures of modern machines. To enable MPI-based programs to more fully exploit features of...

متن کامل

Parallel Fourier-Motzkin Elimination

Fourier{Motzkin elimination is a computationally expensive but powerful method to solve a system of linear inequalities for real and integer solution spaces. Because it yields an explicit representation of the solution set, in contrast to other methods such as Simplex, one may, in some cases, take its longer run time into account. We show in this paper that it is possible to considerably speed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008